NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Federated target trial emulation using distributed observational data for treatment effect estimation

https://doi.org/10.1038/s41746-025-01803-y

Li, Haoyang; Zang, Chengxi; Xu, Zhenxing; Pan, Weishen; Rajendran, Suraj; Chen, Yong; Wang, Fei (December 2025, npj Digital Medicine)

Full Text Available
Multicenter target trial emulation to evaluate corticosteroids for sepsis stratified by predicted organ dysfunction trajectory

https://doi.org/10.1038/s41467-025-59643-z

Rajendran, Suraj; Xu, Zhenxing; Pan, Weishen; Zang, Chengxi; Siempos, Ilias; Torres, Lisa; Xu, Jie; Bian, Jiang; Schenck, Edward J; Wang, Fei (December 2025, Nature Communications)

Full Text Available
Unified Insights: Harnessing Multi-modal Data for Phenotype Imputation via View Decoupling

Zhang, Qiannan; Pan, Weishen; Bai, Zilong; Su, Chang; Wang, Fei (June 2025, Advances in neural information processing systems)

Full Text Available
Local Discovery by Partitioning: Polynomial-Time Causal Discovery Around Exposure-Outcome Pairs

Maasch, Jacqueline RMA; Pan, Weishen; Gupta, Shantanu; Kuleshov, Volodymyr; Gan, Kyra; Wang, Fei (July 2024, Proceedings of The 40th Conference on Uncertainty in Artificial Intelligence.)

Full Text Available
An adaptive federated learning framework for clinical risk prediction with electronic health records from multiple hospitals

https://doi.org/10.1016/j.patter.2023.100898

Pan, Weishen; Xu, Zhenxing; Rajendran, Suraj; Wang, Fei (January 2024, Patterns)

Full Text Available
Learning across diverse biomedical data modalities and cohorts: Challenges and opportunities for innovation

https://doi.org/10.1016/j.patter.2023.100913

Rajendran, Suraj; Pan, Weishen; Sabuncu, Mert R; Chen, Yong; Zhou, Jiayu; Wang, Fei (February 2024, Patterns)

Full Text Available
InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models

Wang, Yingheng; Schiff, Yair; Gokaslan, Aaron; Pan, Weishen; Wang, Fei; De_Sa, Christopher; Kuleshov, Volodymyr (July 2023, International Conference on Machine Learning)
Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care

https://doi.org/10.1371/journal.pdig.0000117

Rajendran, Suraj; Xu, Zhenxing; Pan, Weishen; Ghosh, Arnab; Wang, Fei (March 2023, PLOS Digital Health)
Frasch, Martin G. (Ed.)
With the wider availability of healthcare data such as Electronic Health Records (EHR), more and more data-driven based approaches have been proposed to improve the quality-of-care delivery. Predictive modeling, which aims at building computational models for predicting clinical risk, is a popular research topic in healthcare analytics. However, concerns about privacy of healthcare data may hinder the development of effective predictive models that are generalizable because this often requires rich diverse data from multiple clinical institutions. Recently, federated learning (FL) has demonstrated promise in addressing this concern. However, data heterogeneity from different local participating sites may affect prediction performance of federated models. Due to acute kidney injury (AKI) and sepsis’ high prevalence among patients admitted to intensive care units (ICU), the early prediction of these conditions based on AI is an important topic in critical care medicine. In this study, we take AKI and sepsis onset risk prediction in ICU as two examples to explore the impact of data heterogeneity in the FL framework as well as compare performances across frameworks. We built predictive models based on local, pooled, and FL frameworks using EHR data across multiple hospitals. The local framework only used data from each site itself. The pooled framework combined data from all sites. In the FL framework, each local site did not have access to other sites’ data. A model was updated locally, and its parameters were shared to a central aggregator, which was used to update the federated model’s parameters and then subsequently, shared with each site. We found models built within a FL framework outperformed local counterparts. Then, we analyzed variable importance discrepancies across sites and frameworks. Finally, we explored potential sources of the heterogeneity within the EHR data. The different distributions of demographic profiles, medication use, and site information contributed to data heterogeneity.
more » « less
Full Text Available
Generalizability of a Machine Learning Model for Improving Utilization of Parathyroid Hormone-Related Peptide Testing across Multiple Clinical Centers

https://doi.org/10.1093/clinchem/hvad141

Yang, He S; Pan, Weishen; Wang, Yingheng; Zaydman, Mark A; Spies, Nicholas C; Zhao, Zhen; Guise, Theresa A; Meng, Qing H; Wang, Fei (September 2023, Clinical Chemistry)

Abstract BackgroundMeasuring parathyroid hormone-related peptide (PTHrP) helps diagnose the humoral hypercalcemia of malignancy, but is often ordered for patients with low pretest probability, resulting in poor test utilization. Manual review of results to identify inappropriate PTHrP orders is a cumbersome process. MethodsUsing a dataset of 1330 patients from a single institute, we developed a machine learning (ML) model to predict abnormal PTHrP results. We then evaluated the performance of the model on two external datasets. Different strategies (model transporting, retraining, rebuilding, and fine-tuning) were investigated to improve model generalizability. Maximum mean discrepancy (MMD) was adopted to quantify the shift of data distributions across different datasets. ResultsThe model achieved an area under the receiver operating characteristic curve (AUROC) of 0.936, and a specificity of 0.842 at 0.900 sensitivity in the development cohort. Directly transporting this model to two external datasets resulted in a deterioration of AUROC to 0.838 and 0.737, with the latter having a larger MMD corresponding to a greater data shift compared to the original dataset. Model rebuilding using site-specific data improved AUROC to 0.891 and 0.837 on the two sites, respectively. When external data is insufficient for retraining, a fine-tuning strategy also improved model utility. ConclusionsML offers promise to improve PTHrP test utilization while relieving the burden of manual review. Transporting a ready-made model to external datasets may lead to performance deterioration due to data distribution shift. Model retraining or rebuilding could improve generalizability when there are enough data, and model fine-tuning may be favorable when site-specific data is limited.
more » « less
Full Text Available

Search for: All records